The minimum Euclidean-norm point in a convex polytope: Wolfe’s combinatorial algorithm is exponential
Abstract
The complexity of Philip Wolfe’s method for the minimum Euclidean-norm point problem over a convex polytope has remained unknown since he proposed the method in 1974. The method is important because it is used as a subroutine for one of the most practical algorithms for submodular function minimization. We present the first example that Wolfe’s method takes exponential time. Additionally, we improve previous results to show that linear programming reduces in strongly-polynomial time to the minimum norm point problem over a simplex.
The fundamental algorithmic problem we consider here is: Given a convex polytope , to find the point of minimum Euclidean norm, i.e., the closest point to the origin or what we call its minimum norm point for short. We assume is presented as the convex hull of finitely many points (not necessarily in convex position). We wish to find
Finding the minimum norm point in a polytope is a basic auxiliary step in several algorithms arising in many areas of optimization and machine learning; a subroutine for solving the minimum norm point problem can be used to compute the projection of an arbitrary point to a polytope (indeed, is the same as ). The minimum norm problem additionally appears in combinatorial optimization, e.g., for the nearest point problem for transportation polytopes [1, 4] and as a vital subroutine in Bárány and Onn’s approximation algorithm to solve the colorful linear programming problem [2]. One of the most important reasons to study this problem is because the minimum norm problem can be used as a subroutine for submodular function minimization through projection onto the base polytope, as proposed by Fujishige [8]. Submodular minimization is useful in machine learning, where applications such as large scale learning and vision require efficient and accurate solutions [16]. The problem also appears in optimal loading of recursive neural networks [6]. The Fujishige-Wolfe algorithm is currently considered an important practical algorithm in applications [10, 9, 5]. Furthermore, Fujishige et al. first observed that linear programs may be solved by solving the minimum norm point problem [9], so this simple geometric problem is also relevant to the theory of algorithmic complexity of linear optimization.
One may ask about the complexity of other closely related problems. First, it is worth remembering that norm minimization over a polyhedron is NP-hard for (see [11] and the references therein), while for the convexity of the norm allows for fast computation. One can prove that it is NP-hard to find a closest vertex of a convex polytope given by inequalities. The reduction for hardness is to the directed Hamiltonian path problem: Given a directed graph and two distinct vertices , one aims to decide whether contains a directed Hamiltonian path from to . It is well-known there is a polytope represented by inequalities whose vertices correspond to the characteristic vectors of a directed path joining to in . See e.g., Proposition 2.6 in the book [17] for the details, including the explicit inequality description of this polytope. Finally, by a change of variable , changing zeros to ones and vice versa, the minimum Euclidean norm vertex becomes precisely the “longest path from to ”, solving the directed Hamiltonian path problem.
Since the Euclidean norm is a convex quadratic form, the minimum norm point problem is a special case of convex quadratic optimization problem. Indeed, it is well-known that a convex quadratic programming problem can be approximately solved in polynomial time; that is, some point within distance of the desired minimizing point may be found in polynomial time with respect to . This can be done either through several iterative (convergent) algorithms, such as the Ellipsoid method [15] and interior-point method techniques [3]. Each of these are methods whose complexity depends upon the desired accuracy. However, an approximate numerical solution is inconvenient when the application requires more information, e.g., if we require to know the face that contains the minimum norm point. Numeric methods that converge to a solution and require further rounding are not as convenient for this need.
In this paper, we focus on combinatorial algorithms that rely on the structure of the polytope. There are several reasons to study the complexity of combinatorial algorithms for the minimum norm problem. On the one hand, the minimum norm problem can indeed be solved in strongly-polynomial time for some polytopes; most notably in network-flow and transportation polytopes (see [4, 1, 22], and references therein, for details). On the other hand, while linear programming reduces to the minimum norm problem, it is unknown whether linear programming can be solved in strongly-polynomial time [20], thus the complexity of the minimum norm point problem could also impact the algorithmic efficiency of linear programming and optimization in general. For all these reasons it is natural to ask whether a strongly-polynomial time algorithm exists for the minimum norm problem for general polytopes.
Our contributions:
-
•
In 1974, Philip Wolfe proposed a combinatorial method that can solve the minimum-norm point problem exactly [24, 23]. Since then, the complexity of Wolfe’s method was not understood. In Section 1 we present our main contribution and give the first example that Wolfe’s method has exponential behavior. This is akin to the well-known Klee-Minty examples showing exponential behavior for the simplex method [14]. Prior work by [5] showed that after iterations, Wolfe’s method returns an -approximate solution to the minimum norm point on any polytope.
-
•
As we mentioned earlier, an enticing reason to explore the complexity of the minimum norm problem is its intimate link to the complexity of linear programming. It is known that linear programming can be polynomially reduced to the minimum norm point problem [10]. In Section 2, we strengthen earlier results showing that linear optimization is strongly polynomial time reducible to the minimum norm point problem on a simplex.
1 Wolfe’s method exhibits exponential behavior
For convenience of the reader and to set up notation we start with a brief description of Wolfe’s method. We will then describe our exponential example in detail, proving the exponential behavior of Wolfe’s method. First, we review two important definitions. Given a set of points , we have two minimum-norm points to consider. One is the affine minimizer which is the point of minimum norm in the affine hull of , . The second is the convex minimizer which is the point of minimum norm in the convex hull of , . Note that solving for the convex minimizer of a set of points is exactly the problem we are solving, while solving for the affine minimizer of a set of points is easily computable.
1.1 A brief review of Wolfe’s combinatorial method
Wolfe’s method is a combinatorial method for solving the minimum norm point problem over a polytope, , introduced by P. Wolfe in [24]. The method iteratively solves this problem over a sequence of subsets of no more than affinely independent points from and it checks to see if the solution to the subproblem is a solution to the problem over using the following lemma due to Wolfe. We call this Wolfe’s criterion.
Lemma 1 (Wolfe’s criterion [24]).
Let , then is the minimum norm point in if and only if
Note that this tells us that if there exists a point so that then is not the minimum norm point in . We say that violates Wolfe’s criterion and using this point should decrease the minimum norm point of the current subproblem.
It should be observed that just as Wolfe’s criterion is a rule to decide optimality over , one has a very similar rule for deciding optimality over the affine hull, .
Lemma 2 (Wolfe’s criterion for the affine hull).
Let be a non-empty finite set of points. Then is the minimum norm point in iff for all we have .
Proof.
() Let with be an arbitrary point in and suppose for . We have
Then and so .
() Suppose is the minimum norm point in . Suppose that for some . First, consider the case when and define Then we have
since . This contradicts our assumption that is the minimum norm point in . The case when is likewise proved by considering with . Thus, we have that . ∎
We say a set of affinely independent points is a corral if the affine minimizer of lies in the relative interior of . Note that singletons are always corrals. Carathéodory’s theorem implies that the minimum norm point of will lie in the convex hull of some corral of points among . The goal of Wolfe’s method is to search for a corral containing the (unique) minimizing point.
The pseudo-code in Method 1 below presents the iterations of Wolfe’s method. It is worth noticing that some steps of the method can be implemented in more than one way and Wolfe proved that all of them lead to a correct algorithm (for example, the choice of the initial point in line 2). We therefore use the word method to encompass all these variations and we discuss specific choices when they are relevant to our analysis of the method.
The subset of points being considered as the potential corral is maintained in the set . Iterations of the outer-loop, where points are added to , are called major cycles and iterations of the inner-loop, where points are removed from , are called minor cycles. The potential corral, , is named so because at the beginning of a major cycle it is guaranteed to be a corral, while within the minor cycles it may or may not be a corral. Intuitively, a major cycle of Wolfe’s method inserts an improving point which violates Wolfe’s criterion ( so that ) into , then the minor cycles remove points until is a corral, and this process is repeated until no points are improving and is guaranteed to be a corral containing the minimizer.
It can be shown that this method terminates because the norm of the convex minimizer of the corrals visited monotonically decreases and thus, no corral is visited twice [24]. Like [5], we sketch the argument in [24]. One may see that the norm monotonically decreases by noting that the convex minimizer over the polytope may result from one of two updates to , either at the end of a major cycle or at the end of a minor cycle. Let be the corral at the beginning of a major cycle (line 3 of Method 1) and let be the current minimizer, then the affine minimizer has norm strictly less than that of by Lemma 2, uniqueness of the affine minimizer and the fact that where is the added point. Now, either is updated to or a minor cycle begins. Let be the potential corral at the beginning of a minor cycle (line 6 of 1), let be the current convex combination of points of and let be the affine minimizer of . Note that is a proper convex combination of and and since , we have . Thus, we see that every update of decreases its norm. Note that the number of minor cycles within any major cycle is bounded by , where is the dimension of the space. Thus, the total number of iterations is bounded by the number of corrals visited multiplied by . It is nevertheless not clear how the number of corrals grows, beyond the bound of .
Within the method, there are two moments at which one may choose which points to add to the potential corral. Observe that at line 2 of the pseudocode, one may choose which initial point to add to the potential corral. In this paper we will only consider one initial rule, which is to initialize with the point of minimum norm. Observe that at line 4 of the pseudocode, there are several potential choices of which point to add to the potential corral. Two important examples of insertion rules are, first, the minnorm rule which dictates that one chooses, out of the improving points for the potential corral, to add the point of minimum norm. Second, the linopt rule dictates that one chooses, out of the improving points for the potential corral, to add the point minimizing . Notice that insertion rules are to Wolfe’s method what pivot rules are to the Simplex Method (see [21] for a summary).
As with pivot rules, there are advantages and disadvantages of insertion rules. For example, the minnorm rule has the advantage that its implementation only requires an initial ordering of the points, then in each iteration it need only to search for an improving point in order of increasing norm and to add the first found. However, the linopt insertion rule has the advantage that, if the polytope is given in H-representation (intersection of halfspaces) rather than V-representation (convex hull of points), one may still perform Wolfe’s method by using linear programming to find minimizing over the polytope. In other words, Wolfe’s method does not need to have the list of vertices explicitly given, but suffices to have a linear programming oracle that provides the new vertex to be inserted. This feature of Wolfe’s method means that each iteration can be implemented efficiently even for certain polyhedra having too many vertices and facets: specifically, over zonotopes (presented as a Minkowski sum of segments) [9] and over the base polyhedron of a submodular function [8].
Now we present examples that show that the optimal choice of insertion rule depends on the input data. We first present a simple example where the minnorm rule outperforms the linopt rule. That is, the minnorm insertion rule is not in obvious disadvantage to the linopt rule. In Section 1.2, we present a family of examples where the minnorm rule takes exponential time, while we expect the linopt rule to take polynomial time in this family.
The first example shows that instances can have different performance depending on the choice of insertion rule. Consider the simplex shown in Figure 1 (we present the coordinates of vertices in the figure’s caption). We list the steps of Wolfe’s method on for the minnorm and linopt insertion rules in Tables 1 and 2 and demonstrate a single step from each set of iterations in Figure 2. Each row lists major cycle and minor cycle iteration number, the vertices in the potential corral, and the value of and at the end of the iteration (before for major cycles). Note that the vertex is added to the potential corral twice with the linopt insertion rule, as evidenced in Table 2.
| Major Cycle | Minor Cycle | |||
|---|---|---|---|---|
| 0 | 0 | |||
| 1 | 0 | |||
| 2 | 0 | |||
| 3 | 0 | |||
| 3 | 1 |
| Major Cycle | Minor Cycle | |||
|---|---|---|---|---|
| 0 | 0 | |||
| 1 | 0 | |||
| 2 | 0 | |||
| 2 | 1 | |||
| 3 | 0 | |||
| 4 | 0 | |||
| 4 | 1 |


Currently, there are examples of exponential behavior for the simplex method for all known deterministic pivot rules. It is our aim to provide the same for insertion rules on Wolfe’s method. In the next subsection we will present the first exponential-time example using the minnorm insertion rule.
1.2 An exponential lower bound for Wolfe’s method
To understand our hard instance, it is helpful to consider first a simple instance that shows an inefficiency of Wolfe’s method. The example is a set of points where a point leaves and reenters the current corral: 4 points in , . If one labels the points , the sequence of corrals with the minnorm rule is , where point enters, leaves and reenters (For succintness, sets of points like may be denoted .). The idea now is to recursively replace point 1 (that reenters) in this construction by a recursively constructed set of points whose corrals are then considered twice by Wolfe’s method. To simplify the proof, our construction uses a variation of this set of 4 points with an additional point and modified coordinates. This modified construction is depicted in Fig. 3, where point 1 corresponds to a set of points , points 2,3 correspond to points and point 4 corresponds to points .
The high-level idea of our exponential lower bound example is the following. We will inductively define a sequence of instances of increasing dimension of the minimum norm point problem. Given an instance in dimension , we will add a few dimensions and points so that, when given to Wolfe’s method, the number of corrals of the new augmented instance in dimension has about twice the number of corrals of the input instance in dimension . More precisely, our augmentation procedure takes an instance in , adds two new coordinates and adds four points, , to get an instance in .
Points are defined so that the method on instance goes first through every corral given by the points in the prior configuration and then goes to corral . To achieve this under the minimum norm rule, the four new points have greater norm than any point in and they are in the geometric configuration sketched in Fig. 3.


At this time, no point in is in the current corral and so, if a point in is part of the optimal corral, it will have to reenter, which is expensive. Points are defined so that is a corral after , but now every point in is improving according to Wolfe’s criterion and may enter again. Specifically, every corral in , with appended, is visited again.
Before we start describing the exponential example in detail, we wish to review preliminary lemmas of independent interest which will be used in the arguments. The first lemma demonstrates that orthogonality between finite point sets allows us to easily describe the affine minimizer of their union. Figure 4 shows two such situations, one in which the affine hull of the union of the point sets span all of and one in which it does not.


Lemma 3.
Let be a proper linear subspace. Let be a non-empty finite set. Let be another non-empty finite set. Let be the minimum norm point in . Let be the minimum norm point in . Let be the minimum norm point in . We have:
-
1.
is the minimum norm point in and therefore, if or , then with .
-
2.
If and , then is a strict convex combination of and .
-
3.
If , and and are corrals, then is also a corral.
Proof.
If then part 1 follows immediately. If at least one of is non-zero, then they are also distinct by the orthogonality assumption. Given two distinct points , one can show that the minimum norm point in the line through them is where . For points as in the statement, the minimum norm point in is with . Thus, is also the minimum norm point in . We will now use the optimality condition in Lemma 2 to conclude that . Let . Then can be computed in two steps: First project onto (a subspace that contains ). This projection is by optimality of . Then project onto . This shows that . A similar calculation shows for any . We conclude that is the minimum norm point in . This proves part 1.
Part 2 follows from our expression for above, which is in when and .
Under the assumptions of part 3, we have that is a strict convex combination of and is a strict convex combination of . This combined with the conclusion of part 2 gives that is a strict convex combination of . The claim in part 3 follows. ∎
The following lemma shows conditions under which, if we have a corral and a new point that only has components along the minimum norm point of the corral and along new coordinates, then the corral with the new point added is also a corral. Moreover, the new minimum norm point is a convex combination of the old minimum norm point and the added point. Figure 5 gives an example of such a situation in . Denote by the linear span of the set .
Lemma 4.
Let be a finite set of points that is a corral. Let be the minimum norm point in . Let , and assume . Then is a corral. Moreover, the minimum norm point in is a (strict) convex combination of and the minimum norm point of : with .
Proof.
Let be the minimum norm point in . Intuitively, should be the minimum norm point in the line through and . We will characterize and show that it is a strict convex combination of (which implies that it is a corral). Given two points , one can show that the minimum norm point in the line through them is where . Thus, we define with . By definition we have .
The minimality of the norm of follows from the optimality condition in Lemma 2. It holds by construction for . It also holds for : The projection of onto can be computed in two steps. First, project onto (a subspace that contains ), which is by optimality of . Then project onto . This shows that (the second equality by optimality of ). We conclude that is of minimum norm in .
To conclude that is a corral, we show that is a strict convex combination of points . It is enough to show that is a strict convex combination of and . We have by assumption. We also have by assumption. ∎
Our last lemma shows that if we have points in two orthogonal subspaces, and , then adding a point from to a set from does not cause any points from that previously did not violate Wolfe’s criterion (for the affine minimizer) to violate it. Figure 6 demonstrates this situation.
Lemma 5.
For a point define . Suppose that we have an instance of the minimum norm point problem in as follows: Some points, , live in a proper linear subspace and some, , in . Let be the minimum norm point in and be the minimum norm point in . Then .
Proof.
Let be the span of and . We first show . To see this, suppose not. Decompose as , where and . Decompose as where and (this is possible because is orthogonal to , by optimality of , Lemma 2). Thus, with orthogonal to . This implies that has a smaller norm than and . This is a contradiction.
To conclude, we have is a halfspace in whose normal is parallel to the projection of onto (It is helpful to understand how to compute the intersection of a hyperplane with a subspace. If and is a linear subspace, then . In other words, in order to intersect a hyperplane with a subspace we project the normal.) That is, it is parallel to . But that halfspace must also contain on its boundary. Thus, that halfspace is equal to . ∎
We will now describe our example in detail. The simplest version of our construction uses square roots and real numbers. We present instead a version with a few additional tweaks so that it only involves rational numbers.
Let . For odd , let be a list of points in defined inductively as follows: Let denote the minimum norm point in . Let , which is a rational upper bound to the maximum -norm among the points in . (For a first reading one can let be the maximum -norm among points in , which leads to an essentially equivalent instance except that it is not rational.) Similarly, let , which is a rational lower bound to the minimum norm among points in . (Again, for a first reading one can define which leads to an essentially equivalent instance, except that it is not rational.)
We finally present the example. If we identify with a matrix where the points are rows, then the points in are given by the following block matrix:
The last four rows of the matrix are the points of the configuration. For a picture of the case of see Figure 7.


Remark:
First note that strictly speaking , and that we are defining an embedding of it into , for which we have to use a recursive process. To avoid unnecessary notation, we will abuse the notation. The point denotes both a point of and of the subsequent , i.e., will be the identical copy of within , but we add two extra zero coordinates. Depending on the context will be understood as both a -dimensional vector or as a -dimensional vector (e.g., when doing dot products). The points of become a subset of the point configuration by padding extra zeros. See Figures 3 and 8 which illustrate this embedding and address our visualizations of these sets in three dimensions.
Theorem 6.
Consider the execution of Wolfe’s method with the minnorm point rule on input where . Then the sequence of corrals has length .
Proof.
Points in are shown in order of increasing norm. Let denote the last four points of , respectively. Let denote the ordered sequence of corrals in the execution of Wolfe’s method on . Let denote the last (optimal) corral in .
The rest of the proof will establish that the sequence of corrals is
(where a concatenation such as denotes every corral in with and added). After this sequence of corrals is established, we solve the resulting recurrence relation: Let denote the length of . We have , . This implies (with ).
All we must show now to complete the proof of Theorem 6 is that has indeed the stated recursive form. We do this by induction on . The steps of the proof are written as claims with individual proofs.
By construction, starts with . This happens because points in are ordered by increasing norm and the proof proceeds inductively as follows: The first corral in is the minimum norm point in , which is also the first corral in . Suppose now that the first corrals of coincide with the first corrals of . We will show that corral in is the same as corral in . To see this, it is enough to see that the set of points in that can enter (improving points) contains the point that enters in (with two zeros appended) and contains no point of smaller norm. This two-part claim is true because the two new zero coordinates play no role in this and all points in but not in have a larger norm than any point in .
Once is reached (with minimum norm point ), the set of improving points, as established by Wolfe’s criterion, consist of . Now, because we are using the minimum-norm insertion rule, the next point to enter is .
Claim 1.
is a corral.
Proof of Claim.
Claim 2.
The next improving point to enter is .
Proof of Claim.
We first check that no point in can enter. From Lemma 4 we know the optimal point in corral explicitly in terms of the optimal point of and , namely is a convex combination , with . Let . We check that it cannot enter via Wolfe’s criterion. We compute in two steps: First project onto (a subspace that contains ). This projection is longer than by optimality of . Then project onto . This shows that and cannot enter as it is not an improving point according to Wolfe’s criterion.
By construction, is closer to the origin than , so to conclude it is enough to check that is an improving point per Wolfe’s criterion. Compute
since by construction and . On the other hand,
Thus, , that is, is an improving point. ∎
Claim 3.
The current set of points, , is not a corral. Points in leave one by one. The next corral is .
Proof of Claim.
Instead of analyzing the iterations of Wolfe’s inner loop, we use the key fact, from Section 1.1, that the inner loop must end with a corral whose distance to the origin is strictly less than the previous corral. We look at the alternatives: This new corral cannot be (the previous corral) or any subset of it because it would not decrease the distance. An analysis similar to that of 1 or basic trigonometry (in three-dimensions) shows that is a corral whose distance to the origin is larger than the distance for . See Fig. 9, where we show a projection, the perpendicular line segments to and are shown in dotted line after projection. Thus, the new corral cannot be or any subset of it.
No set of the form with and non-empty can be a corral: To see this, first note that the minimum norm point in is in the segment , specifically, point (minimality follows from Wolfe’s criterion, Lemma 1). This implies that the minimum norm point in cannot be in the relative interior of when is non-empty (see Figure 10).
The only remaining non-empty subset is , which is the new corral. ∎


Claim 4.
The set of improving points is now .
Proof of Claim.
Recall that the optimal point in corral has coordinates . Thus, when computing distances and checking Wolfe’s criterion it is enough to do so in the two-dimensional situation depicted in Figure 11. Thus, a hyperplane orthogonal to the segment from the origin to is shown in the figure. It leaves the points in above and both and below making them the only improving points. ∎
Point enters since it has smallest norm.
Claim 5.
Point leaves and the next corral is .
Proof of Claim.
To start, notice that by construction the four points lie on a common hyperplane, , parallel to the subspace spanned by . Thus, one does not need to do distance calculations but rather Fig. 13 is a faithful representation of the positions of points. The origin is facing the hyperplane , parallel to . The closest point to the origin within is in the line segment joining thus, as we move vertically, the closest point to the origin in triangle must be in the line segment joining and . ∎
Claim 6.
The only improving point now is .
Proof of Claim.
Once more we rely in two different orthogonal two-dimensional projections of to estimates distances and to check Wolfe’s criterion. The line segment from the origin to the optimum of the corral (we could calculate this exactly, but it is not necessary), and its orthogonal hyperplane are shown in Figure 13. This shows only or are candidates but is in the current corral, so only may be added. ∎
Point enters as the closest improving point to the origin.
Claim 7.
Point leaves. The next corral is .
Proof of Claim.
We wish to find the closest point to the origin in triangle . From Figure 15 the optimum is between ; this point is . Clearly this point is below , so it must leave the corral. ∎
Claim 8.
The set of improving points is now (with two zero coordinates appended).
Proof of Claim.
Now Wolfe’s criterion hyperplane contains the four points by construction leaving on the same side as the origin (see Figure 15). ∎
The first (and minimum norm) point in enters and the next corral is this point together with and . That is, the next corral is precisely the first corral in . We will prove inductively that the sequence of corrals from now on is exactly all of . To see this, we repeatedly invoke Lemma 5 after every corral with equal to the subspace spanned by the first coordinate vectors of . Suppose that the current corral is , where is one of the corrals in and denote the next corral in by . From Lemma 5, we get that the set of improving points for corral is obtained by taking the set of improving points for corral and removing . Thus, the point that enters is the same that would enter after corral . Let denote that point.
Claim 9.
The next corral is .
Proof of Claim.
The current set of points is . If is a corral then so is (by Lemma 3, part 3) and the claim holds. If is not a corral, it is enough to prove that the sequence of points removed by the inner loop of Wolfe’s method on this set is the same as the sequence on set . We will show this now by simultaneously analyzing the execution of the inner loop on and . We distinguish the two cases with the following notation: variables are written without a bar () and with a bar, respectively.
Let be the sequence of current points constructed by the inner loop on . Let be the sequence of removed points. Let be the sequence of current sets of points at every iteration. Let be the corresponding sequence on . Let be the corresponding sequence of removed points. Let be the corresponding sequence of current sets of points. We will show inductively: , there is a one-to-one correspondence between sequences and , and . More specifically, the correspondence is realized by maintaining the following invariant in the inner loop: is a strict convex combination of and the minimum norm point in .
For the base case, from Lemma 3, part 2, we have that is a strict convex combination of (which is the minimum norm point in ) and the minimum norm point in segment , specifically .
For the inductive step, if is a strict convex combination of the current set of points , then so is of and the inner loop ends in both cases with corrals and , respectively. The claim holds. If is not a strict convex combination of the current set of points , then neither is of . The inner loop then continues by computing the minimum norm point in and in , respectively. It then finds point in that is closest to in segment . It finds , respectively. It then selects a point to be removed, and a point , respectively. From Lemma 3, part 2, we have that is a strict convex combination of and .
We will argue that is a strict convex combination of and . To see this, we note that segment lies in the hyperplane where the last coordinate is 0. Therefore we only need to intersect it with the part of that lies in that hyperplane. This part is exactly , which can be written in a more explicit way as the union of all segments of the form with . Even more, we only need to look at triangle , as all relevant segments lie on it. The intersection of this triangle with is segment and therefore the intersection of the triangle with is simply triangle . This implies that the intersection between segment and is the same as the intersection between segment and triangle . This intersection is an interval where is a strict convex combination of and and is the closest point to in that intersection.
It follows that the set of potential points to be removed is the same for the two executions. Specifically, if is a strict convex combination of a certain subset of , then is a strict convex combination of . The sets of points that can potentially be removed are and (the same), respectively. In particular111Under a mild consistency assumption on the way a point is chosen when there is more than one choice, for example, “choose the point with smallest index among potential points.”, . This implies . Also, and is in . This completes the inductive argument about the inner loop and proves the claim. ∎
This completes the proof of Theorem 6. ∎
2 Linear optimization reduces to minimum-norm problems on simplices
We reduce linear programming to the minimum norm point problem over a simplex via a series of strongly polynomial time reductions. The algorithmic problems we will consider are defined below. We give definitions for the problems of linear programming (LP), feasibility (FP), bounded feasibility (BFP), V-polytope membership (VPM), zero V-polytope membership (ZVPM), zero V-polytope membership decision (ZVPMD), distance to a V-polytope (DVP) and distance to a V-simplex (DVS). (Prefix “V-” means that the respective object is specified as the convex hull of a set of points.) See [18, 12, 19] for a detailed discussions of strongly polynomial time algorithms.
Definition 7.
Consider the following computational problems:
-
•
LP: Given a rational matrix , a rational column vector , and a rational row vector , output rational if is finite, otherwise output INFEASIBLE if is empty and else output INFINITE.
-
•
FP: Given a rational matrix and a rational vector , if is nonempty, output a rational , otherwise output NO.
-
•
BFP: Given a rational matrix , a rational vector and a rational value , if is nonempty, output a rational , otherwise output NO.
-
•
VPM: Given a rational matrix and a rational vector , if is nonempty, output a rational , otherwise output NO.
-
•
ZVPM: Given a rational matrix , if is nonempty, output a rational , otherwise output NO.
-
•
ZVPMD: Given rational points , output YES if and NO otherwise.
-
•
DVP: Given rational points defining , output .
-
•
DVS: Given affinely independent rational points defining -dimensional simplex , output .
The main result in this section reduces linear programming to finding the minimum norm point in a (vertex-representation) simplex.
Theorem 8.
LP reduces to DVS in strongly-polynomial time.
To prove each of the lemmas below, we illustrate the problem transformation and its strong polynomiality. The first two reductions are highly classical, while those following are intuitive, but we do not believe have been written elsewhere.
Below is the sequence of algorithmic reductions that reduce LP to DVS.
Lemma 9.
LP reduces in strongly-polynomial time to FP.
Proof.
Let denote the FP oracle.
| (1) |
| (2) |
Claim 10.
Proof of Claim.
Suppose satisfies (1). Then . Define and note . Then . Now, suppose satisfies . Let be the positive coordinates of the vector and be the negative components in absolute value, so and . Define . Since , we have that and by construction, . Note that ∎
Claim 11.
Proof of Claim.
Clearly, constructing the required FP problems takes strongly polynomial time and we have only two calls to , so the reduction is strongly-polynomial time. ∎
Lemma 10.
FP reduces in strongly-polynomial time to BFP.
Proof.
Let denote the oracle for BFP. Suppose , and define and . If the entry of , or the entry of , define and or and .
Claim 12.
The FP is feasible if and only if the BFP is feasible.
Proof of Claim.
If the BFP is feasible then clearly the FP is feasible. Suppose the FP is feasible. By the theory of minimal faces of polyhedra, we can reduce this to a FP defined by a square matrix, , in the following way: By [7, Theorem 1.1], there is a solution, , with no more than positive entries so that and the positive entries of combine linearly independent columns of to form . Let denote the matrix containing only these linearly independent columns and denote only the positive entries of . Then . Now, note that where . Since the column rank of equals the row rank of , we may remove linearly dependent rows of and the corresponding entries of , forming and so that where , and is a full-rank matrix.
Define and note that . Define and note that . Define and and note that and are integral. By Cramer’s rule, we known that where denotes with the th column replaced by . By integrality, , so . Now, note that defines a solution, , to the original system of equations. Let if the th column of was the selected th column of and if the th column of was not selected. Note then that . ∎
Thus, we have that the FP and BFP are equivalent. To see that this is a strongly-polynomial time reduction, note that adding this additional constraint takes time for constructing the number plus small constant time. This number takes comparisons and multiplications to form. Additionally, this number takes space which is polynomial in the size of the input (polynomial in , and size of , ). ∎
Lemma 11.
BFP reduces in strongly-polynomial time to VPM.
Proof.
Let denote the oracle for VPM.
| (3) |
Claim 13.
Proof of Claim.
Clearly, this reduction is simply a rewriting, so the reduction is strongly-polynomial time. ∎
Lemma 12.
VPM reduces in strongly-polynomial time to ZVPM.
Proof.
Let be the oracle for ZVPM.
| (4) |
Claim 14.
A solution to (4) gives a solution to the VPM instance and vice versa.
Proof of Claim.
Clearly, this reduction is simply a rewriting, so the reduction is strongly-polynomial time. ∎
Lemma 13.
ZVPM reduces in strongly-polynomial time to ZVPMD.
Proof idea. The reduction sequentially asks for every vertex whether it is redundant and if so, it removes it and continues. This process ends with at most vertices so that is a strict convex combination of them and the coefficients can be found in this resulting case by solving a linear system.
Proof.
Let denote the ZVPMD oracle.
Let be the resulting set of points after the loop in the reduction. Claim: contains at most points so that is a strict convex combination of (all of) them. Proof of claim: By Caratheodory’s theorem there is a subset of at most points so that is a strict convex combination of points in . We will see that is actually equal to . Suppose not, for a contradiction. Let . At the time the loop in the reduction examines , no point in has been removed and therefore is redundant and is removed. This is a contradiction. ∎
In our next lemma, we make use of the following elementary fact.
Claim 15.
Given an matrix let be with a row of s appended. The columns of are affinely independent if and only if the columns of are linearly independent. The convex hull of the columns of is full dimensional if and only if rank of is .
Lemma 14.
ZVPMD reduces in strongly-polynomial time to DVS.
Proof.
Clearly ZVPMD reduces in strongly-polynomial time to DVP: Output YES if the distance is 0, output NO otherwise.
Given an instance of distance to a V-polytope, , we reduce it to an instance of DVS as follows: We lift the points to an affinely independent set in higher dimension, a simplex, by adding small-valued new coordinates. 15 allows us to handle affine independence in matrix form. Let be the matrix having columns . Let be the rows of . Let be the all-ones vector. We want to add vectors , for some , so that is of rank . To this end, we construct an orthogonal basis (but not normalized, to preserve rationality) of the orthogonal complement of . The basis is obtained by applying the Gram-Schmidt orthogonalization procedure (without the normalization step) to the sequence . Denote the resulting orthogonal basis of the orthogonal complement of . The matrix with rows is of rank and so is the matrix with rows
for any (to be fixed later). Therefore, the columns of this matrix are linearly independent. Let be the matrix with rows
Let be the columns of . By construction and 15 they are affinely independent. Let denote the convex hull of these -dimensional rational points. Polytope is a simplex. Moreover, if , then
(where we use that , from the Gram-Schmidt construction).
The reduction proceeds as follows: Let be the maximum of the absolute values of all numerators and denominators of entries in (which can be computed in strongly polynomial time222Equivalently, without loss of generality we can assume that the input is integral, and then take to be the maximum of the absolute values of all entries in , as done in Schrijver’s [18, Section 15.2].). From Lemma 15, we have if . Compute rational so that . For example, let . The reduction queries for constructed as above and given by the choice of we just made. It then outputs YES if and NO otherwise. ∎
Lemma 15.
Let be a V-polytope with . Let be the maximum of the absolute values of all numerators and denominators of entries in . If then .
Proof.
The claim is clearly true if is empty. If is non-empty, let . We have that every facet of can be written as , where is an integral vector, is an integer and the absolute values of the entries of as well as are less than ([13, Theorem 3.6]). By assumption at least one these facet inequalities is violated by 0. Denote by one such inequality. Let . We have , and . The claim follows. ∎
3 Conclusions and open questions
We have seen that Wolfe’s method using a natural point insertion rule exhibits exponential behavior. We have also shown that the minimum norm point problem for simplices is intimately related to the complexity of linear programming. Our work raises several very natural questions:
-
•
Are there exponential examples for other insertion rules for Wolfe’s method? Also, at the moment, the ordering of the points starts with the closest point to the origin, but one could also consider a randomized initial rule or a randomized insertion rule.
-
•
For applications in submodular function minimization, the polytopes one considers are base polytopes and our exponential example is not of this kind. Could there be hope that for base polytopes Wolfe’s method performs better?
-
•
It would be interesting to understand the average performance of Wolfe’s method. How does it behave for random data? Further randomized analysis of this method would include the smoothed analysis of Wolfe’s method or at least the behavior for data following a prescribed distribution.
-
•
We have seen that it is already quite interesting to study the minimum norm point problem for simplices, when we discussed the connection with linear programming. Is there a family of simplices where Wolfe’s method takes exponential time?
-
•
Can Wolfe’s method be extended to other convex norms for ?
Acknowledgements
We thank Gilberto Calvillo, Deeparnab Chakrabarty, Antoine Deza, Stephanie Jegelka, Matthias Köppe, Tamon Stephen, and John Sullivan for useful suggestions, conversations and comments about this project. This work was partially supported by NSF grant DMS-1440140, while the first and second authors were in residence at the Mathematical Sciences Research Institute in Berkeley, California, during the Fall 2017 semester. The first and second author were also partially supported by NSF grant DMS-1522158. The second author was also partially supported by the University of California, Davis Dissertation Fellowship. The third author was also partially supported by NSF grants CCF-1657939 and CCF-1422830 while the third author was in residence at the Simons Institute for the Theory of Computing.
References
- [1] Achim Bachem and Bernhard Korte. Minimum norm problems over transportation polytopes. Linear Algebra and its Applications, 31:103–118, 1980.
- [2] Imre Bárány and Shmuel Onn. Colourful linear programming and its relatives. Mathematics of Operations Research, 22(3):550–567, 1997.
- [3] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
- [4] Gilberto Calvillo and David Romero. On the closest point to the origin in transportation polytopes. Discrete Applied Mathematics, 210:88–102, 2016.
- [5] Deeparnab Chakrabarty, Prateek Jain, and Pravesh Kothari. Provable submodular minimization using wolfe’s algorithm. In Advances in Neural Information Processing Systems, pages 802–809, 2014.
- [6] Vijay Chandru, Abhi Dattasharma, S Sathiya Keerthi, NK Sancheti, and V Vinay. Algorithms for the optimal loading of recursive neural nets. In SODA, pages 342–349, 1995.
- [7] Sergei Nikolaevich Chernikov. The convolution of finite systems of linear inequalities. USSR Computational Mathematics and Mathematical Physics, 5(1):1–24, 1965.
- [8] Satoru Fujishige. Lexicographically optimal base of a polymatroid with respect to a weight vector. Math. Oper. Res., 5(2):186–196, 1980.
- [9] Satoru Fujishige, Takumi Hayashi, and Shigueo Isotani. The minimum-norm-point algorithm applied to submodular function minimization and linear programming. Kyoto University. Research Institute for Mathematical Sciences [RIMS], 2006.
- [10] Satoru Fujishige and Shigueo Isotani. A submodular function minimization algorithm based on the minimum-norm base. Pacific Journal of Optimization, 7:3–17, 2011.
- [11] Dongdong Ge, Xiaoye Jiang, and Yinyu Ye. A note on the complexity of minimization. Mathematical Programming, 129(2):285–299, Oct 2011.
- [12] Martin. Grötschel, Lásló Lovász, and Alexander Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer-Verlag, Berlin, 1988.
- [13] Martin. Grötschel, Lásló Lovász, and Alexander Schrijver. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica, 1(2):169–197, 1981.
- [14] Victor Klee and George J. Minty. How good is the simplex algorithm? In Inequalities, III (Proc. Third Sympos., Univ. California, Los Angeles, Calif., 1969; dedicated to the memory of Theodore S. Motzkin), pages 159–175. Academic Press, New York, 1972.
- [15] Mikhail K. Kozlov, Sergei P. Tarasov, and Leonid G. Khachiyan. The polynomial solvability of convex quadratic programming. USSR Computational Mathematics and Mathematical Physics, 20(5):223–228, 1980.
- [16] Kiyohito Nagano, Yoshinobu Kawahara, and Kazuyuki Aihara. Size-constrained submodular minimization through minimum norm base. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 977–984, 2011.
- [17] George L. Nemhauser and Laurence A. Wolsey. Integer programming and combinatorial optimization. Wiley, 1988.
- [18] Alexander Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, Chichester, England, 1998.
- [19] Alexander Schrijver. Combinatorial optimization. Polyhedra and efficiency. Vol. A, volume 24 of Algorithms and Combinatorics. Springer-Verlag, Berlin, 2003. Paths, flows, matchings, Chapters 1–38.
- [20] Steve Smale. Mathematical problems for the next century. The Mathematical Intelligencer, 20(2):7–15, 2000.
- [21] Tamás Terlaky and Shuzhong Zhang. Pivot rules for linear programming: A survey on recent theoretical developments. Annals of Operations Research, 46(1):203–233, 1993.
- [22] László A. Végh. A strongly polynomial algorithm for a class of minimum-cost flow problems with separable convex objectives. SIAM J. Comput., 45(5):1729–1761, 2016.
- [23] Philip Wolfe. Algorithm for a least-distance programming problem. Math. Programming Stud., 1:190–205, 1974. Pivoting and extensions.
- [24] Philip Wolfe. Finding the nearest point in a polytope. Mathematical Programming, 11(1):128–149, 1976.